Fuzzy Full-Text Searches in OCR Databases

نویسندگان

  • Andreas Myka
  • Ulrich Güntzer
چکیده

Though the quality of optical character recognition software is steadily improving, it is still far from being perfect. As a result, full-text databases that are lled by means of OCR software contain many errors. These errors have to be taken into consideration if such kind of databases are examined by means of full-text searches. In this chapter, we will illustrate some of the possible methods that { to a certain extent { cope with the uncertainty of the database entries. These methods add fuzziness to precisely formulated queries in order to increase their recall. In addition, the described methods are compared to the method of matching query terms exactly: the preliminary results of tests that show their eeects on recall and precision are given.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Retrieval of Spelling Variants in Nonstandard Texts – Automated Support and Visualization

This article describes ongoing research in the RSNSR (Regelbasierte Suche in Textdatenbanken mit nichtstandardisierter Rechtschreibung, “Rule-based search in text databases with nonstandard orthography”) project. The focus of this project is making historical text documents digitally available; consequently, it examines the challenges for digitization procedures and subsequent retrieval operati...

متن کامل

Investigation on Full-Text Databases Cited in LIS

Background and Aim: The main objective of this research was to investigate the use of full-text databases in the LIS theses of Tehran State Universities within the years 2005 and 2009. Method: For this purpose, the total of 9952 citations related to 172 existing theses in the academic central libraries were studied. The data collected were analyzed by the bibliometrics and citation analysis met...

متن کامل

NF-SAVO: Neuro-Fuzzy system for Arabic Video OCR

In this paper we propose a robust approach for text extraction and recognition from video clips which is called Neuro-Fuzzy system for Arabic Video OCR. In Arabic video text recognition, a number of noise components provide the text relatively more complicated to separate from the background. Further, the characters can be moving or presented in a diversity of colors, sizes and fonts that are n...

متن کامل

Systematic literature review of fuzzy logic based text summarization

Information Overloadrq  is not a new term but with the massive development in technology which enables anytime, anywhere, easy and unlimited access; participation & publishing of information has consequently escalated its impact. Assisting userslq    informational searches with reduced reading surfing time by extracting and evaluating accurate, authentic & relevant information are the primary c...

متن کامل

TEVI: Text Extraction for Video Indexing

Efficient indexing and retrieval of digital video is an important aspect of video databases. One powerful index for retrieval is the text appearing in them. It enables content based browsing. In this paper, we describe a system for detecting and extracting text appearing in video frames A supervised learning method based on color and edge information is used to detect text regions. After an uns...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995